European and African-specific plasma protein-QTL and metabolite-QTL analyses identify ancestry-specific T2D effector proteins and metabolites

Initially focused on the European population, multiple genome-wide association studies (GWAS) of complex diseases, such as type-2 diabetes (T2D), have now extended to other populations. However, to date, few ancestry-matched omics datasets have been generated or further integrated with the disease GWAS to nominate the key genes and/or molecular traits underlying the disease risk loci. In this study, we generated and integrated plasma proteomics and metabolomics with array-based genotype datasets of European (EUR) and African (AFR) ancestries to identify ancestry-specific muti-omics quantitative trait loci (QTLs). We further applied these QTLs to ancestry-stratified T2D risk to pinpoint key proteins and metabolites underlying the disease-associated genetic loci. We nominated five proteins and four metabolites in the European group and one protein and one metabolite in the African group to be part of the molecular pathways of T2D risk in an ancestry-stratified manner. Our study demonstrates the integration of genetic and omic studies of different ancestries can be used to identify distinct effector molecular traits underlying the same disease across diverse populations. Specifically, in the AFR proteomic findings on T2D, we prioritized the protein QSOX2; while in the AFR metabolomic findings, we pinpointed the metabolite GlcNAc sulfate conjugate of C21H34O2 steroid. Neither of these findings overlapped with the corresponding EUR results.


Introduction
Human genetics studies have mainly focused on participants of European ancestry 1 .However, there has been a recent increase in studies that include multiple ancestral backgrounds [2][3][4] .With these efforts, human geneticists now have published numerous studies on complex diseases that encompass multiple populations or speci cally target non-European populations.For example, two of the largest studies on type-2 diabetes (T2D) included participants of ve different ancestries 5,6 -Europeans, Africans, Hispanics, East Asians, and South Asians.However, there has been a lack of utilization of ancestrymatched deep molecular phenotyping datasets in post-GWAS analyses, namely colocalization 7,8 , Mendelian Randomization 9,10 or trait-imputation (such as Transcriptome-wide association study (TWAS)   for RNA expression 11,12 ).These approaches are pivotal to prioritizing variants to genes or further to pathways.
Recent publications have included single-layer omics genetic studies that incorporate participants from multiple ancestries rather than just a single ancestry.In 2022, two proteomic studies were published: (i) Zhang et al., 13 investigated plasma proteomic data from participants of European and African ancestries within the Atherosclerosis Risk in Communities (ARIC) cohort; (ii) Schubert et al., 14 examined plasma proteomic data from African American, East Asian, European, and Hispanics participants within the Multi-Ethnic Study of Atherosclerosis (MESA) cohort.Both studies applied the TWAS framework to study the cis-only protein effects on diseases.Consequently, they coined the term PWAS speci cally for studying the proteome.Additionally, a more recent study, published in 2023 with the largest-to-date

Genetic architecture of the plasma proteome in participants of African and European ancestry
To build genetic maps of the plasma proteome and metabolome, we performed pQTL and metabolite-QTL (mQTL) analyses separately (Fig. 1; Figure S3A-F, Table S2-7).To obtain the ancestry strati ed maps, we further split the input data into African and European sub-cohorts.In summary, we utilized an aptamer-based assay (SOMAscan 7k platform 18 ) to measure the multi-ancestry proteomics data and a mass-spectrometry assay (Metabolon HD4 platform 19 ) for the metabolomics data of the same cohort.Following quality control procedures for the omics data and integration with array-based post-imputation genotype data, we constructed four maps: i) AFR pQTL (414 participants and 6,907 proteins), ii) EUR pQTL (2,338 participants and 6,907 proteins), iii) AFR mQTL (417 participants and 1,413 metabolites), and iv) EUR mQTL (2,392 participants and 1,483 metabolites).To determine the study-wide signi cant QTLs, we derived it from genome-wide signi cance after further accounting for the number of independent features within each separate map (see Methods for more details).
To identify genetic variants associated with the plasma proteome in individuals of African ancestry, we conducted pQTL mapping on African ancestry participants (Fig. 2A).Of 6,907 proteins that passed QC, we identi ed 881 proteins with 954 study-wide signi cant pQTLs (Table S11).Among these ndings, 420 pQTLs were classi ed as cis, while 534 were trans-pQTLs.Consistent with previous studies 20,15,21 , we observed that the absolute effect size was negatively correlated with the minor allele frequency (Figure S4A).After assigning each pQTL to its corresponding linkage disequilibrium (LD) block 22 , we identi ed a total of 508 unique genetic loci, including 84 pleiotropic regions.Notably, we found 33 proteins associated with the APOE locus (Figure S5A), which ranked second in terms of AFR proteomicassociated pleiotropic regions and genomic hotspots.The other top-ve pleiotropic loci included VTN (chr17q11.2) with 36 proteins, ABO (chr9q34.2) with 24 proteins, CFH (chr1q31.3)with 23 proteins, and the MHC region with 22 proteins.Next, we performed a strati ed analysis for individuals of European ancestry.Of the 6,907 proteins, 2,400 proteins showed 2,848 signi cant pQTLs in this population; 1,282 cis-pQTLs and 1,566 trans-pQTL (Fig. 2B; Table S12; Figure S4B).Of the top ve pleiotropic pQTL loci in EUR (totally 746 regions), the APOE locus (chr19q13.32)was associated with 126 proteins (Figure S5B).The other top-ve pleiotropic loci included the VTN (chr17q11.2) with 182 proteins, CFH (chr1q31.3)with 151 proteins, MHC region with 86 proteins, and the ABO locus (chr9q34.2) with 82 proteins.
To determine how many of the pQTLs have been reported before, we conducted a comparison between our study-wide pQTLs and the three largest to date external studies covering both cis and trans associations while encompassing two genetic ancestries (see Methods).Of these four datasets from three external pQTL studies (Table S13), Ferkingstad et al., 23 and Sun et al., 15 included participants of EUR ancestry, while Surapaneni et al., 20 and Sun et al., 15 sampled individuals of AFR ancestry.Overall, out of the 954 AFR pQTLs (Table S14) identi ed, we found that 561 had been previously reported with a study-wide signi cant p-value threshold of 5×10 − 11 .Additionally, among the remaining 393 AFR pQTLs, 14 had been reported with a genome-wide threshold, 45 had passed a nominal threshold, and 242 did not show nominal signi cance in previous studies.Among the pQTLs that were not tested, 82 were due to missing proxy variants, and 10 were due to missing protein data.Considering the largest number of proteins (~ 5k) pro led from a large-scale European cohort in the study by Ferkingstad et al., 23 , we can replicate the highest number of our ndings for the EUR pQTLs, with a p-value below 5×10 − 2 in this external study.Of the 2,848 EUR pQTLs identi ed (Table S14), 2,052 had been reported as study-wide signi cant (p < 5×10 − 11 ), 43 were below a genome-wide threshold, 241 were below a nominal threshold, while 395 were above the nominal threshold.This indicates that 86% of the tested pQTLs have supportive evidence from previous studies.Among the untested pQTLs, 81 were due to missing proxy variants and 36 were due to the missing protein data.

Genetic architecture of the plasma metabolome in participants of African and European ancestry
To detect genetic variants associated with the plasma metabolome in African and European ancestry respectively, we performed mQTL mapping using the same participants from which the proteomic data was generated (Fig. 2C-D).After quality controlling for both genotype and metabolome datasets, we identi ed 65 signi cant mQTLs in 34 genetic loci, associated with 60 metabolites (out of a total of 1,413 metabolites tested) in the African-American cohort (Fig. 2C, Table S15).Notably, a signi cant number of these hits (27 out of 34 loci) were involved in and enriched for the super pathway of amino acids (odds ratio = 2.78, Fisher exact test, p-value = 6.63×10 − 5 ) compared to other pathways.In the European-American cohort, we found 490 signi cant mQTLs in 124 genetic regions associated with 403 metabolites (Fig. 2D, Table S16).The majority of the hits were observed in two super pathways: lipids (198 metabolites, odds ratio = 1.26,Fisher exact test, p-value = 0.019) and amino acids (106 metabolites, odds ratio = 1.52,Fisher exact test, p-value = 0.0015).In line with a previous cross-platform mQTL study 24 , we observed both sets of mQTL followed a trend where the absolute effect size negatively correlated with the minor allele frequency across all variants (Figure S4C-D).Additionally, we identi ed top-ranked LD blocks associated with metabolites were loci near ALMS1P1 (chr2p13.1),UGT1A6 (chr2q37.1),and PYROXD2 (chr10q24.2) in the African cohort (Figure S5C), and FADS1/2 (chr11q12.2),SLCO1B1 (chr12p12.1),ALMS1P1 (chr2p13.1) in the European-American cohort (Figure S5D).These regions represent metabolite-associated pleiotropic regions and genomic hotspots speci c to their respective populations.
To assess the presence of previously reported mQTLs, we examined our study-wide mQTLs using three of the up-to-date external studies with full summary statistics available, which included participants from both genetic ancestries (details see Methods).Among these three external mQTL studies (Table S17), Yin et al., 25 , and Chen et al., 26 focused on individuals of EUR ancestry alone, while Rhee et al., 27 included participants of AFR ancestry.Of the 65 AFR mQTLs identi ed (Table S18), we found that 48 had already been reported after multiple testing corrections with a study-wide signi cance threshold of 5×10 − 11 .None of these 65 mQTLs were below a genome-wide threshold, one passed a nominal threshold, seven were above the nominal threshold, and nine were not examined as the metabolites were missing.On the other hand, of the 490 EUR mQTLs identi ed (Table S18), we found 412 passed the study-wide threshold of p-value as 5×10 − 11 , seven passed a genome-wide threshold, ten passed a nominal threshold, while nine did not reach the nominal threshold.Among the mQTLs that were not tested, two were due to the missing proxy variants and 50 were due to the missing metabolite data.These results indicate that approximately 88% of tested mQTL pairs in the African ancestry cohort and 98% of tested mQTL pairs in the European ancestry cohort have supportive evidence from previous studies.

Ancestry-speci c pQTLs and mQTLs
To identify ancestry-speci c xQTLs (i.e., pQTL and mQTL), we compared results between participants of African and European ancestry.Brie y, ancestry-speci c hits were determined based on fold-change criteria and considering both the effect size and standard error (for deriving the Z-normalized effect size), following the methodology used in previous condition-dependent genetic studies 29,30 .The fold-change greater than ten-fold or smaller than 10%-fold was used as the threshold, with log10-based fold change boundaries of +/-1 to determine context-speci c QTLs (see Methods).
In the case of proteomics, of the 954 pQTLs identi ed in AFR participants, 29.6% were considered AFRspeci c pQTLs (Fig. 3A, Table S19).For example, in African-ancestry participants, the protein levels of HSF2B (Heat shock factor 2-binding protein) were positively associated with the genetic variant chr5:14626365:T:C ( , ), However, in European-ancestry participants, the HSF2B protein levels were similar across all genotypes of the same variant ( , ), resulting a large fold-change difference ( , Fig. 3B).Similarly, among the 2,848 pQTL identi ed in EUR participants, 24.3% were considered EUR-speci c pQTLs (Fig. 3C, Table S20).In the European-ancestry cohort, the protein levels of Apo A-V (Apolipoprotein A-V) were signi cantly decreased with the minor allele dosages of the variant chr11:116780399:C:T ( , ) but this association was not observed in African-ancestry participants ( , ), leading to a fold-change ratio of -0.452 (Fig. 3D).
In the eld of metabolomics, we compared mQTL results from participants of the two ancestries.Of the 65 mQTLs identi ed with AFR participants, 20% were classi ed as AFR-speci c mQTLs (Fig. 3E, Table S21).For instance, in AFR participants, the metabolite abundances of the X-23447 were increased with the minor allele dosage of the genetic variant chr5:175129766:C:T ( , ).However, in EUR participants, this metabolite displayed similar levels across all genotypes of the variant ( , ), resulting in a substantial fold-change difference ( , Fig. 3F).On the other hand, among the 490 mQTLs identi ed with EUR participants, 23.7% were considered EUR-speci c mQTLs (Fig. 3G, Table S22).In the EUR cohort, the levels of myristoyl dihydrosphingomyelin (d18:0/14:0) were signi cantly and positively associated with the minor allele dosages of the variant chr20:12978750:T:C ( , ), whereas in the AFR group, this association was not observed ( , ), leading to a fold-change ratio of -0.289 (Fig. 3H).
To further characterize the ancestry-speci c QTLs, we categorized the genetic variants into three bins based on minor allele frequency (MAF).The bins were de ned as follows: bin-1, MAF ranging from 0 to 0.01; bin-2, MAF ranging from 0.01 to 0.05; and bin-3, MAF ranging from 0.05 to 0.5.The MAF threshold for each variant was determined by considering the minimum MAF value between the two ancestries.We found that the larger the MAF of a genetic variant, the less likely it was to be speci c to a particular ancestry (Figure S6A-G).On average, the proportion of ancestry-speci c QTLs decreased from 68.5-45% as the MAF bins shifted from bin-1 to bin-2.Moreover, the ancestry-speci c QTLs decreased to 10.8% at bin-3.Power analyses (see Methods) indicated that the sentinel xQTLs used in this ancestry speci city section were well-powered given the current sample size.Even though the variants with lower MAF tend to have a lower power with the same effect size, all the ancestry-speci c variants showed > 80% power in the other ancestry.As a complementary strategy, we employed a more exible Bayesian approach, multivariate adaptive shrinkage (MASH) framework 29 , to calculate the posterior probability and posterior mean for each QTLtrait pair in the two ancestries.The posterior mean fold change also indicated a similar ancestry-sharing proportion ranging from 82.3-96.2%(Figure S7A-D), which supports the previous estimations of ancestry-speci c QTLs.These results align with previous studies (Zhang et al., 13 for proteomics and Rhee et al., 27 for metabolomics datasets).Zhang et al., 13 reported 10% EUR-speci c and 30% AFRspeci c cis-pQTLs, while Rhee et al., 27 uncovered 22% ancestry-speci c mQTLs.Moreover, our ndings extend these previous reports by analyzing different MAF bins, revealing that the ancestry-speci c QTLs are more likely to have lower frequencies.This observation could be explained by the fact that functional variants tend to have lower frequencies than non-functional variants, and therefore, these ancestryspeci c QTLs may capture those functional variants.Alternatively, participants of African ancestry may have a higher prevalence of rare variants compared to those of European ancestry, which increases the likelihood of nding ancestry-speci c associations.
Integration of proteins and metabolites with the risk of ancestrymatched T2D via PWAS and MWAS Finally, to identify proteins associated with ancestry-speci c risk of T2D, we rst employed the PWAS framework.Speci cally, we prioritized proteins that were associated with ancestry-matched T2D risk within each ancestry group, namely EUR-and AFR-strati ed analyses (Fig. 4A-B, Figure S8-9, Table S23- 24).For the EUR T2D risk analysis, we used the summary statistics from the GWAS published by From the ancestry-strati ed PWAS analyses, we found 74 proteins in EUR and eight in AFR associated with T2D after multiple testing corrections (Fig. 4A).In the European ancestry group, the associated proteins included C4b, NFL, BGAT, among others.(Fig. 4A, top).For the African ancestry group, the associated proteins were FAM3D, MGAT3, SPC25, and BGAT (Fig. 4A, bottom).Four proteins (SPC25, BGAT, FAM3D, MGAT3) were found in common in the EUR and AFR-speci c analyses.
We applied the same framework to assess the association between metabolite levels and T2D in an ancestry-speci c manner (Fig. 4B, Figure S8-9, Table S25-26).We identi ed 23 EUR and two AFR metabolites given Bonferroni-corrected thresholds (Fig. 4B).Of the 23 signi cant metabolites from EUR MWAS analysis (Fig. 4B, top), 14 have been studied in differential level analyses concerning T2D, and six of these metabolites were reported to be differentially expressed between T2D cases and controls in a comprehensive external study by Zaghlool et al., 31 .Furthermore, two noteworthy AFR metabolites from AFR MWAS analysis (Fig. 4B, bottom) that exhibited signi cance were GlcNAc sulfate conjugate of C21H34O2 steroid and 1-stearoyl-2-arachidonoyl-GPC.Neither of them had been reported previously in the differentially expression study 31 .One metabolite, 1-stearoyl-2-arachidonoyl-GPC (18:0/20:4), was in common between EUR and AFR groups.

Integration of proteins and metabolites with the risk of ancestry-matched T2D via genetic colocalization and Mendelian randomization
To minimize the possibility of false-positive ndings in XWAS (including PWAS and MWAS), we implemented two supplementary analyses: genetic colocalization (Fig. 4C-D, Table S27-30) and Mendelian randomization (Fig. 4C-D, Table S31-34) between the molecular traits and T2D risk).
In the EUR-strati ed analyses (Figure S10A-B), there were 36 proteins and 21 metabolites signi cant for the XWAS and colocalization analyses; and 13 proteins and six metabolites signi cant for the XWAS and MR.In the AFR-speci c analyses (Figure S10C-D), four proteins and two metabolites were signi cant on the XWAS and colocalization analyses.Moreover, two proteins were signi cant in both PWAS and MR.
To emphasize the proteins and metabolites with the highest con dence in terms of their association with the T2D risk, we only highlighted those that are signi cant after multiple test correction in all three analyses: XWAS, colocalization, and MR.Consequently, we pinpointed ve proteins and four metabolites in the EUR-speci c (Figure S11A-E, Figure S12A-D) and one protein and one metabolites the AFR-speci c analyses (Figure S11F; Figure S12E; Table S35-38).No proteins were in common between the EUR-and AFR-speci c analyses (Fig. 4E).
In the AFR-speci c analyses (PWAS, colocalization, MR), we nominated a protein called QSOX2 (Fig. 4E, Figure S11F).This protein is regulated by a trans-pQTL under the known risk gene, ABO 6 (Table S35).It is worth noting that the variant chr9:133252214:G:A used as an instrumental variable was not in high LD (r 2 = 4×10 − 4 ) with the closest pleiotropic variant chr9:133254260:G:A, and hence it was included in the analysis).QSOX2 showed a positive association with T2D risk (PWAS.Z = 4.763 and MR.beta = 0.246).Additionally, the posterior probability of genetic colocalization between QSOX2 and the disease was high (PP.H4 = 0.977).Although QSOX2 has been previously annotated in the process of protein folding 32 , it has not been published as a known T2D effector.
Of the ve EUR proteins associated with EUR T2D 5 (Fig. 4E, Table S36, Figure S11A-E), Dtk, C1QT4, and MANBA were encoded by genes that were already known to be implicated in T2D.In this study, these proteins were nominated based on the PWAS, colocalization and MR results.These results were driven by cis-pQTLs to these proteins.These loci were reported as being genome-wide signi cant and those genes were nominated based on based on variant annotation, genetic colocalization with eQTLs, pcHi-C links, and TWAS signi cance 5,6 ., Dtk has been reported to be involved in the positive regulation of kinase activity 32 ; C1QT4 plays roles in positive regulation of interleukin-6 and NF-KappB signaling; and MANBA participates in protein modi cation processes, speci cally glycosylation.
Just like the proteomic results, there were no metabolites in common between EUR and AFR-prioritized results based in the triple analyses (MWAS, colocalization, MR).The only AFR-signi cant metabolite, the GlcNAc sulfate conjugate of C21H34O2 steroid**, had not been reported to be implicated with T2D before (Fig. 4E, Table S37).This metabolite was regulated by the genetic variant near UGT3A1 (mQTL with chr5:35965868:A:C (negLog10p-value = 10.6),MWAS p-value = 7.79×10 − 4 , and MR FDR = 9.11×10 − 4 on AFR-T2D) and it belonged to the partially characterized molecules.Interestingly, a similar metabolite (HMDB0001449) has been reported to be reduced in the blood of human patients with major depression 34 .
To extend the XWAS and colocalization ndings using an advanced statistical approach, we applied the recently published INTACT framework 36 (Table S39 -42).In summary, we found ancestry-matched T2D associations with six EUR and two AFR proteins, six EUR and one AFR metabolite (details in Supplementary Notes).These associations contained all the previously mentioned pairs, along with additional pairs related to one EUR protein (MANS4) and one AFR protein (BGAT), as well as two EUR metabolites (5-oxoproline and betaine).Notably, both proteins were reported as effector genes in previous multi-ancestry T2D GWAS 5,6 , whereas the two metabolites were not previously identi ed as such.
Moreover, to detect the patterns of the proteins and metabolites implicated here in T2D, we performed a cell-type-speci c analysis using strati ed-LDSC 37 .Regardless of ancestry, these proteins and metabolites displayed high expression in T cells and myeloid cells (Figure S13).Finally, to nominate druggable targets for repositioning, we queried the proteins and metabolites against the Drugbank database 38 .As a result, we found one EUR protein, Dtk, could be targeted by Fostamatinib, an FDAapproved drug used to treat chronic immune thrombocytopenia.On the other hand, the AFR protein, BGAT, exhibited targetability by 13 drugs, though none of them had FDA approval at the time of this study.Furthermore, two EUR metabolites, 5-oxoproline and betaine, could be targeted by pidolic acid and N,N,N-trimethylglycinium, respectively.Notably, pidolic acid has already been approved by the FDA to treat a family history of diabetes.

Discussion
Our study involved large-scale multi-ancestral multi-omic plasma-based QTL mapping from the same cohort.Using XWAS, colocalization, and Mendelian randomization approaches, we identi ed key proteins and metabolites implicated in T2D.More importantly, our ndings revealed ancestry-speci c results.It is known that different ancestries have different genetic architectures of the same traits, here we extend the genetic ndings to the downstream functional analytes underlying T2D.Speci cally, our EUR-speci c analyses uncovered ve proteins (including three novel) and four metabolites (including three not previously reported) associated with T2D.Similarly, the AFR-speci c analyses identi ed one previously unreported protein and one previously unreported metabolite.
Even we identi ed AFR-speci c proteins and metabolites, the power of identifying unique signals in this ancestry is still lower than in EUR, as the sample size in the disease GWAS and omic data is much lower in AFR, leading to fewer disease-associated loci and xQTLs.Therefore, future research with larger disease GWAS and QTL maps from diverse populations is still necessary.In this study, we only nominate proteins or metabolites that met the stringent criteria of being identi ed through all three integrative strategies: XWAS, colocalization, and MR.This assures the highest con dence in these identi ed effectors being implicated in T2D.However, some proteins and metabolites survived in only two of the three criteria, which could warrant broader investigation.
There are four notable strengths of this study.First, it represents a large-scale research endeavor that encompasses multi-ancestry and multi-omics analyses, allowing for a comprehensive exploration of diverse populations and molecular trait layers at the same time.Second, we included trans QTLs into the conventional cis QTL framework 13,14 , enabling the discovery of more heritable features and expanding our understanding of the genetic underpinnings of the studied traits.In addition, we integrate this QTL data with FUSION, colocalization and Mendelian Randomization approaches to identify high con dence proteins and metabolites implicated on T2D.Our analyses provide additional support for some of the nominated effector genes (Dtk, C1QT4, and MANBA), but also nominated two new proteins in the EURspeci c analyses.One protein, TBCE, is located in a known locus, and we are nominated this protein as functional in this locus.As mentioned, in this study we not-only performed cis-QTL mapping but extended to trans-QTL as this can help to identify novel molecular interactions that otherwise will not be identi ed.The other novel protein identi ed in this study to be implicated on T2D is TPP2.The association of this protein with T2D, is through a trans-QTL with a SNPs located on a locus that includes ERT1 or MFHAS1 as potential effector genes.Additional studies are needed to determine how TPP2 and ERT1 or MFHSA1 interacts with each other, but this study provides strong evidence that these genes are part of the same pathway and it is a good example of how this unbiased analyses will identify new protein-protein interactions.Third, by performing ancestry-matched omics-disease integration, we enhanced the accuracy of our ndings in comparison to previous studies that included ancestry-mixed data 5,6 .This approach may contribute to more precise identi cation of T2D effector genes within speci c ancestral populations.Fourth, to ensure the robustness of our conclusions, we cross-referenced our ndings against the two largest multi-ancestry T2D studies 5,6 .This stringent evaluation allowed us to determine whether our identi ed effectors had been reported previously or not, further reinforcing the signi cance of our results.
On the ip side, there are several limitations to consider in our study.First, our study did not integrate proteomics and metabolomics data due to the lack of colocalization between protein, metabolite, and the ancestry-matched T2D risk loci (data not show).Nonetheless, we believe that genetic colocalization of proteomics and metabolomics could exist if we were not solely focused on T2D-associated loci.Second, there were unequal sample sizes between participants of EUR and AFR ancestry.This discrepancy in sample sizes affected the power to detect QTLs, even though we were well-powered to identify the sentinel variants (Table S9).To mitigate this bias in identifying ancestry-speci c ndings, we employed standardized z-values that accounted for both effect size and standard error, rather than relying solely on p-values, when comparing the two ancestral groups.Third, our study utilized plasma bulk-tissue, which may not re ect the cell type of interest when studying T2D, such as pancreatic islets or beta cells.But as plasma circulates throughout the body 39 , our study holds value in investigating human metabolic disorders in general.Fourth, our study utilized one certain platform for measuring proteomics and the other for metabolomics, this can lead to platform-biased results.We, however, addressed this concern by querying our ndings with external pQTL and mQTL studies that used both the same and different platforms.Fifth, our cohort consisted of participants with various disease statuses, including Alzheimer disease, frontotemporal dementia, and healthy individuals.We and other researchers, however, have reported that few pQTLs 21,[40][41][42] or mQTLs 25,43,44 were status-speci c.Thus, it suggests that the disease status is unlikely to have a signi cant impact, although further studies will be necessary.
Sixth, we de ned our participants based on genetic ancestry 45 , thus we used "African" ancestry rather than African American.We acknowledge our study may contain admixed participants, which could potentially underestimate the ancestry-speci c features observed.
While our study focused on applying these plasma xQTLs to the study of T2D, it is worth noting that these QTL maps can be expanded to explore other diseases as well.These nominated proteins and metabolites are key intermediate phenotypes that can connect the genotype to the disease endpoint.Therefore, identifying these effectors in diverse populations may be an initial step toward developing more precise prediction models and therapies.

Ethics declarations
This project was approved by the ethics committee of the Washington University School of Medicine in St. Louis.

Cohort information
All participants were recruited at the Knight Alzheimer Disease Research Center (Knight ADRC).In total, 3,170 participants from all genetic ancestries were selected for both proteomics and metabolomics pro ling.
We used the TOPMed recommendations 46 when de ning ancestries based on genetic information (See the following section "Genotype QC, imputation, and population strati cation").Therefore, we used the terms of "European (EUR)" and "African (AFR)" when referring to participants recruited at the Knight-ADRC (USA) with European and African genetic backgrounds, respectively and regardless of the country of origin.Most participants recruited at the Knight-ADRC could be also classi ed as "European American" or "African American" in terms of race and ethnicity.
The plasma proteomic and metabolomic datasets were generated from participants, which included 1,254 AD patients, 1,720 healthy controls, 34 frontotemporal dementia patients, and 162 individuals with an unclassi ed neurodegenerative disease.The cohort was a subset of the participants recruited from the Knight ADRC, which includes community-dwelling adults older than 27 years old via prospective studies of memory and aging since 1979.All participants recruited from the Knight ADRC are required to participate in core study procedures, including annual longitudinal clinical assessments, neuropsychological testing, neuroimaging, and bio uid biomarker studies.The corresponding genotype was a priori to choosing participants for pro ling proteomics and metabolomics speci cally in this study.
Plasma samples were collected in the morning after an overnight fast, immediately centrifuged, and stored at -80°C.

Proteomics data QC
In brief, 3,132 participants and 6,907 aptamers passed proteomics QC. 7,548 aptamers were measured before proteomics QC using the SOMAscan 7k platform 18 .Plasma proteomics data from all genetic ancestries were QCed with seven steps (details see Supplementary Notes): Step 1) Limit of detection, scale factor difference, and coe cient of variation; Step 2) IQR-based outlier expression level detections; the IBD.Unrelated participants were de ned as PI_HAT < 0.25.2,395 EUR and 418 AFR participants were kept as unrelated participants.

Identi cation of pQTLs
A linear regression model was used from plink2 47 glm function for each protein.Protein-abundances were log-10 transformed rst and z-scale normalized next.Covariates were age, sex, genotyping array types, genotype PC 1-10, and proteomics PC 1-2 (to correct such batch effects from the proteomics data alone: we identi ed two different batches when visualizing the scatterplots of proteomic PC1 and PC2 (Figure S1E).But after adjusting for the proteomic PC1 and PC2, the batch effect was corrected (Figure S1F)).Genotype array types included Quad660, CoreEx, GSA_v1, GSA_v2, GSA_v3, NeuroX2, Human1M.Duov3, when using as covariates, dummy variable included n-1 rather n.The nal sample size for EUR and AFR pQTL analyses were 2,338 and 414 (Figure S3A, Table S3).The nal numbers of proteins for EUR and AFR pQTL analyses were both 6,907 (Figure S3B, Table S4-5).
For cis and trans de nitions, we used a window of the variants within 1 Mb upstream and downstream of the gene start site by which each protein was coded.P values for each variant-protein pair were estimated using an additive linear regression model.The cis threshold was 5×10 − 8 .For the trans-pQTL analysis, the number of PCs of EUR proteomics and AFR proteomics used as denominators were 1472 and 336, respectively.(The number of PCs was derived as the minimum PC number that cumulatively explains 95% of the variance for the proteomics expression matrix of each ancestry after QC.) Thus, the P-value threshold for EUR was 3.40×10 − 11 (5×10 − 8 /1,472) and for AFR was 1.49×10 − 10 (5×10 − 8 /336).

Identi cation of mQTLs
A linear regression model was used from plink2 47 glm function for each metabolite.Metabolite levels are rst normalized by the median value given the same metabolite and log-10 is transformed next.
For the mQTL analysis, the number of PCs of EUR and AFR metabolomics used as denominators were 766 and 281, respectively.(The number of PCs was derived as the minimum PC number that cumulatively explains 95% of the variance for the metabolomics expression matrix of each ancestry after QC.) Thus, the P-value threshold for EUR was 6.53×10 − 11 (5×10 − 8 /766) and for AFR was 1.78×10 − 10 (5×10 − 8 /281).

Filtering the in ation features
For the in ated features (i.e., associated with variants over 5/3/7/3 different chromosomes corresponding to EUR pQTL/AFR pQTL/EUR mQTL/AFR mQTL [the thresholds are collected empirically]), we rst removed the variants given this feature with MAF < 0.05 and genotyping call rate < 97%.If we found the features were still in ated, we removed the features eventually.The unique features of removal were listed below: EUR proteomics: 142 aptamers; AFR proteomics: 132 aptamers; EUR metabolomics: six metabolites; AFR metabolomics: ve metabolites.

Annotation of the xQTLs
To annotate our QTL ndings, we used the command line tool Variant Effect Predictor (VEP 48 ) from the Ensembl-version107.We used the default options for all four QTL maps.

Replication xQTLs with the external
To replicate our QTL ndings, we queried all study-wide feature-variant pairs from our study against several largest external studies.These studies all released their full summary statistics and set the genetic coordinates in the hg38.The proxy variant was de ned as LD r^2 > = 0.8 using the reference at TOPMed 3 WGS data curated by the tool TOP-LD 49 .
To replicate our proteomics ndings, we used four datasets from three studies.Ferkingstad et al., 23 used 35k participants of European ancestry and the SOMAscan 5k platform to measure plasma proteome.
Surapaneni et al., 20 used 466 participants of African ancestry and the SOMAscan 7k platform to measure serum proteome.Sun et al., 15 used 34k participants of European ancestry as well as 931 participants of African ancestry and the OLINK 3k platform to measure plasma proteome.We set six categories when comparing the study-wide signi cant ndings from this study and its corresponding external studies: 1) validated with a p-value below the Bonferroni-corrected study-wide threshold (5×10 − 11 ) account for 1000 features; 2) known with a p-value below the genome-wide threshold (5×10 − 8 ); 3) replicated with a p-value below the nominal threshold (5×10 − 2 ); 4) not replicated with a p-value greater or equal to the nominal threshold; 5) not reported with a matching protein but a missing proxy variant; 6) not reported with a non-matching protein.
To replicate our metabolomics ndings, we used three studies.Yin et al., 25 used 6,136 participants of European ancestry from Finland and the Metabolon HD4 platform to measure plasma metabolome.
Chen et al., 26 used 8,299 participants of European ancestry and the Metabolon HD4 platform to measure plasma metabolome.Rhee et al., 27 used 687 participants of African ancestry and the Broad Institute platform to measure plasma metabolome.For mQTLs of African ancestry by Rhee et al. 2022, the number of metabolites overlapping between their platform (Broad Institute) with our platform (Metabolon HD4) was 207.This overlap was performed via HMDB-ID matching, rather than chemical name (Table S8).We set four categories when comparing the study-wide signi cant ndings from this study and its corresponding external studies: 1) validated with a p-value below the Bonferroni-corrected study-wide threshold (5×10 − 11 ) account for 1000 features; 2) known with a p-value below the genome-wide threshold (5×10 − 8 ); 3) replicated with a p-value below the nominal threshold (5×10 − 2 ); 4) not replicated with a p-value greater or equal to the nominal threshold; 5) not reported with a matching metabolite but a missing proxy variant; 6) not reported with a non-matching metabolite.

De nition of LD block and pleiotropy
To de ne LD blocks, we used the 1000G EUR (1703 blocks) as the reference population per ldetect by Berisa and Pickrell 22 .We performed liftover to map the hg19 coordinates into hg38.We next used the index to group the variants and obtained the pleiotropic region, which was the index associated with multiple molecular traits.For proteomics, we used karyoploteR 50 package to visualize the top ndings as an ideogram.For metabolomics, we used circlize 51 package to visualize the top ndings as a chord diagram.

Identi cation of ancestry-speci c QTLs
Ancestry-speci city de ned as fold-change over 10-fold or below 0.1-fold between the Z-normalized effect sizes (beta divided by standard error) of the protein-variant pairs or metabolite-variant pairs given the same variants.The fold-changes of the same feature-variant pairs were also calculated after setting 3 bins of MAF as 0 to 0.01, 0.01 to 0.05, and 0.05 to 0.5.
MASH method 29 (implemented as mashR package) was also used.Brie y, after tting the model into the mash function with the beta and standard error of the same QTLs from both EUR and AFR datasets as the input plus the covariance matrices set up given the same input.The fold-change of posterior means of the protein-variant pairs or metabolite-variant pairs given the same variants were calculated.The same thresholds of 10-fold and 0.1-fold were used to determine QTL sharing or not.
Boxplots were drawn with the ggplot2 package; Locus-zoom plots were drawn with the LocusZoom.jstool 52 .

Power analyses of ancestry-speci c QTLs
We performed two separate power analyses using the powerEQTL.ANOVA function from the R package powerEQTL 53 is listed below: a) We calculated the power values for all our current sentinel variants from each of the four QTL sets after splitting them into ancestry-shared and ancestry-speci c subtypes.
Given the input of MAF, the average standardized effect size per ancestry-speci city, the sample size, and using the genome-wide signi cance thresholds (FWER = 0.05, nTests = 1e6), we found all variantfeature pairs with a power of 0.8 or more (Table S9).
b) We next xed the effect size and number of tests, while varying MAF per each of the four QTL sets.We split the MAF into minimum, average, and maximum by ancestry-shared and speci c xQTLs.We empirically learned the MAF and the average standardized effect size and used the genome-wide signi cance thresholds for consistency between ancestries and speci city of xQTLs.
We found that min-MAF led to underpowered ndings, especially in the ancestry-speci c xQTLs.For example, in the EUR mQTL set, the power was 0.007 for identifying EUR-speci c ndings in AFR given minMAF from ancestry-speci c variants (Table S10).On the other hand, the power turned to 1 for identifying the EUR-shared mQTL in AFR given minMAF from shared variants.
Cross-reference of the ancestry-speci c xQTLs with external studies We cross-referred to Zhang et al., 13 for proteomics to examine the proportion of the ancestry-speci c pQTLs.We calculated the percentage of ancestry-speci c pQTLs using the variable "EA-speci c" as TRUE over all EUR-pQTLs in their Supplementary Table 3.1 and the variable "AA-speci c" as TRUE over all AFR-pQTLs in their Supplementary Table 3.2.
We cross-referred Rhee et al., 27 for metabolomics datasets to check the proportion of the ancestryspeci c mQTLs.We derived the percentage of ancestry-speci c mQTLs with ones outside the 10-foldchange per the Z-normalized effect sizes (beta divided by standard error) of EUR and AFR over the total 45 mQTLs in their Table 2.

T2D risk GWAS
We used the two population-scale T2D risk GWAS from EUR and AFR, separately.
For EUR T2D risk GWAS, we used the summary statistics from the study by Mahajan et al., 5  To perform the downstream QTL and disease integration, we aggregated the non-overlapping and non-partially signi cant (p > = 1e-4) variants from the multi-ancestry GWAS by Mahajan et al., 5 to the original AFR GWAS as the nal full summary statistics for the AFR.

PWAS weight calculation
A modi ed version of FUSION 12 was used.The SNP-based heritability of each protein SOMAmer was estimated using the GCTA GREML 54 tool.The proteins with negative h 2 values were removed before performing the weight calculation.The window size of the sentinel QTL region was 1 Mb.For the proteins associated with more than one genetic region, all variants from each region were included in the ± weight calculation.Using the FUSION R package, we constructed imputation models for 881 AFR and 2,400 EUR SOMAmers.The imputation model for a SOMAmer was trained by the best models (out of Elastic Net, TOP1, and BLUP) using all variants in 1 Mb upstream and downstream of the sentinel pQTL sites of the target protein.The Elastic Net model was re tted using all data and the tuning parameters per 5-fold cross-validation.

PWAS association test with T2D
A modi ed version of FUSION 12 was used.We used the 881 (AFR) and 2,400 (EUR) imputation models to perform the PWAS on the ancestry-matched T2D risk.The tool Functionally-informed Z-score Imputation (FIZI 55 ) was used rst to impute the summary statistics of AFR T2D risk (Vujkovic et al., 2020 6 ) and EUR T2D risk (Mahajan et al., 2022 5 ) with the in-sample reference linkage-disequilibrium (LD) information.
The multiple testing corrections for the PWAS results were selected per the total number of imputation models for all weight-non-missing plasma proteins (P value < 0.05/797 in AFR and 0.05/2,285 in EUR).The Z value from PWAS was used to determine the effect size of protein-T2D associations within each ancestry.

MWAS weight calculation
Similar to the above PWAS weight calculation section, a modi ed version of FUSION 12 was used.The SNP-based heritability of each metabolite was estimated using the GCTA GREML 54 tool.The metabolites with negative h 2 values were removed before performing the weight calculation.The window size of the sentinel QTL region was 1 Mb.For the same metabolites associated with more than one genetic region, all variants from each region were included in the weight calculation.Using the FUSION R package, we constructed imputation models for 60 AFR and 403 EUR metabolites.The imputation model for a metabolite was trained by the best models (out of Elastic Net, TOP1, and BLUP) using all variants in 1 Mb upstream and downstream of the sentinel mQTL sites of the corresponding metabolite.The Elastic Net model was re tted using all data and the tuning parameters per 5-fold cross-validation.

MWAS association test with T2D
Similar to the above PWAS association test section, a modi ed version of FUSION 12 was used.We used the 60 (AFR) and 403 (EUR) imputation models to perform the MWAS on the ancestry-matched T2D risk (AFR T2D risk (Vujkovic et al., 2020 6 ) and EUR T2D risk (Mahajan et al., 2022 5 )) after FIZI imputation.
The multiple testing corrections for the MWAS results were selected per the total number of imputation models for all weight-non-missing plasma proteins (P value < 0.05/401 in AFR and 0.05/58 in EUR).The Z value from MWAS was used to determine the effect size of metabolite-T2D associations within each ancestry.
Identi cation of ancestry-speci c XWAS-T2D ndings ± ± Comparison of the p-values with the same analyte-T2D associations given the same trait from the two ancestries.If the analyte-T2D association from each ancestry was signi cant, the analyte was ancestryshared.If the analyte-T2D association from only one ancestry was signi cant, the analyte was ancestryspeci c.The Miami plots for PWAS and MWAS comparing EUR and AFR ndings were plotted using the R package hudson 56 .

Colocalization of molecular traits (proteins/metabolites) and T2D
After PWAS and MWAS, only the signi cant proteins/metabolites were kept.To remove the LD bias in the signi cant PWAS/MWAS ndings, we performed colocalization analysis using both coloc.abffunction from R package coloc v3.1 7 and coloc.susiefunction from R package coloc v5.1 8 with a wrapper for susie_rss function from susieR 57,58 package.We next set the window size to +/-1Mb centering on IV per trait-T2D pair.We used the default priors, with p1 as 1×10 − 4 , p2 as 1×10 − 4 , and p12 as 1×10 − 5 .Evidence for colocalization was assessed using the posterior probability (PP) for hypothesis 4 (indicating there is an association for both protein and disease and they are driven by the same causal variant(s)).We used PP.H4_ nal > 80% as a threshold to suggest that associations were highly colocalized.Under the assumption of only a single causal variant, we used the PP.H4 from coloc.abf output of the trait-disease pair.Under the assumption that multiple causal variants exist 57 , we used the maximum PP.H4 of multiple credible sets from coloc.susie output.

Mendelian randomization of proteins or metabolites on T2D
After PWAS/MWAS, only the signi cant proteins or metabolites were kept.To further infer the causal effects of the proteins or metabolites on T2D risk, we performed MR analyses after removing pleiotropic QTLs.These pleiotropic QTLs were de ned variants within the pleiotropic regions given a minimum feature: 6 proteins for EUR and AFR proteomics: and 11 metabolites for EUR and AFR metabolomics).
After keeping the rest variants as the valid instrumental variables, we used R package TwoSampleMR 9 v0.5.7, which includes two primary methods: For every single SNP, the most basic way, Wald ratio, was used; For multiple SNPs, inverse variance weighted (IVW) estimator was used.All pQTL or mQTLs used as instrumental variables have F-statistics greater than 10 and with study-wide signi cance after the insample LD clumping with plink1.9 47.As not all features within each cohort are independent, we used a false discovery rate (FDR) < 0.05 as our multiple-test correction approach.
Evidence integration of proteins/metabolites on T2D After intersecting PWAS or MWAS and coloc, we consider MR as an extra layer of evidence.We rst intersected features with XWAS passing Bonferroni correction; coloc passing PP.H4 > 0.8 and MR FDR < 0.05 as signi cant proteins/metabolites.We next used the INTACT framework 36 to compute an updated PP with XWAS and coloc as input.Given the INTACT PP > 0.8, we further intersected with MR FDR < 0.05 as signi cant proteins/metabolites.

Effector genes of proteins/metabolites on T2D
We rst assembled the effector genes of T2D risk GWAS from two multi-ancestry studies 5,6 .For the study by Mahajan et al., we combined their missense annotations, colocalization with seven-tissue eQTL and plasma pQTL, pcHi-C annotations.In total, 834 unique genes were nominated by the authors.For the study by Vujkovic et al., we combined their missense annotations, TWAS, and colocalization results from 52 tissue eQTLs.In total, 754 unique genes were nominated by the authors.We next queried our protein and metabolite ndings using the nearest gene to the genetic locus.If the gene can be found in the effector list, we de ne it as the nding that was reported.
Cell-type speci city analysis the proteins/metabolites T2D After intersecting and MR results, we performed cell-type speci city analysis of the proteins/metabolites on T2D using S-LDSC 37 with the cell-type annotation from ImmGen.Overall, 295 cell subtypes were used and grouped into ve major cell types: B, Myeloid, Natural Killer (NK), T, and other cells.The input for S-LDSC was each protein or metabolite genome-wide summary statistics.The output of the S-LDSC was the regression coe cient of the cell type-speci c annotation from ImmGen as well as the p-value of the coe cient for each feature.We then ranked the feature by the p-value and used the top cell type as the enriched cell type for this feature.We used two input les: one from the INTACT and MR integration results; and the other from all features with at least one QTL.We calculated the foldchange of the two sets given the same feature-ancestry combination (i.e., EUR proteins, AFR proteins, EUR metabolites, and AFR metabolites).

Druggable target query of the proteins/metabolites implicated T2D
To nominate targets for repositioning, we queried the proteins and metabolites against the drugbank database 38 (drugbank_5.1.10.db)downloaded locally.For proteins, we rst downloaded the csv le for all Drug Target Identi ers (https://go.drugbank.com/releases/latest#protein-identiers).The uniprotID and drugbankID were linked in the csv le.We next used the proteins with an overlapping uniprotID to query the corresponding drugbankID via the drugbankR R package (https://github.com/girke-lab/drugbankR). For metabolites, we rst used the hmdbID to query via the hmdbQuery R package (https://github.com/vjcitn/hmdbQuery)and extracted the "drugbank_id" for the nal query via the drugbankR.

Declarations Figures
Schematics of the study design on genetic architecture of the plasma proteome and in participants with African and European ancestry.Top: The plasma proteins in participants of African and European ancestries were pro led together with the SOMAscan 7k platform.Integrating the abundance of each protein with the array-based genotype data, we identi ed pQTLs in both ancestries.We further used these pQTLs to prioritize proteins in the ancestry-matched T2D risk via three methods, PWAS, MR, and COLOC.Bottom: The plasma metabolites in participants of African and European ancestries were pro led together with the Metabolon HD4 platform.Integrating the level of each metabolite with the array-based genotype data, we identi ed mQTLs in both ancestries.We further used these mQTLs to prioritize metabolites in the ancestry-matched T2D risk via three methods, PWAS, MR, and COLOC.

Figure 2 Four
Figure 2

Figure 4 Integration
Figure 4